On pattern matching with k mismatches and few don't cares
نویسندگان
چکیده
We consider the problem of pattern matching with k mismatches, where there can be don't care or wild card characters in the pattern. Specifically, given a pattern P of length m and a text T of length n, we want to find all occurrences of P in T that have no more than k mismatches. The pattern can have don't care characters, which match any character. Without don't cares, the best known algorithm for pattern matching with k mismatches has a runtime of [Formula: see text]. With don't cares in the pattern, the best deterministic algorithm has a runtime of O(nk polylog m). Therefore, there is an important gap between the versions with and without don't cares. In this paper we give an algorithm whose runtime increases with the number of don't cares. We define an island to be a maximal length substring of P that does not contain don't cares. Let q be the number of islands in P. We present an algorithm that runs in [Formula: see text] time. If the number of islands q is O(k) this runtime becomes [Formula: see text], which essentially matches the best known runtime for pattern matching with k mismatches without don't cares. If the number of islands q is O(k2), this algorithm is asymptotically faster than the previous best algorithm for pattern matching with k mismatches with don't cares in the pattern.
منابع مشابه
A Filtering Algorithm for k -Mismatch with Don't Cares
We present a filtering based algorithm for the k-mismatch pattern matching problem with don’t cares. Given a text t of length n and a pattern p of length m with don’t care symbols in either p or t (but not both), and a bound k, our algorithm finds all the places that the pattern matches the text with at most k mismatches. The algorithm is deterministic and runs in Θ(nmk logm) time.
متن کاملPattern matching with don't cares and few errors
We present solutions for the k-mismatch pattern matching problem with don’t cares. Given a text t of length n and a pattern p of length m with don’t care symbols and a bound k, our algorithms find all the places that the pattern matches the text with at most k mismatches. We first give an Θ (n(k + logm log k) log n) time randomised algorithm which finds the correct answer with high probability....
متن کاملMatching with don't-cares and a small number of mismatches
In matching with don’t-cares and k mismatches we are given a pattern of length m and a text of length n, both of which may contain don’t-cares (a symbol that matches all symbols), and the goal is to find all locations in the text that match the pattern with at most k mismatches, where k is a parameter. We present new algorithms that solve this problem using a combination of convolutions and a d...
متن کاملk -Mismatch with Don't Cares
We give the first non-trivial algorithms for the k-mismatch pattern matching problem with don’t cares. Given a text t of length n and a pattern p of length m with don’t care symbols and a bound k, our algorithms find all the places that the pattern matches the text with at most k mismatches. We first give an O(n(k + log n log log n) logm) time randomised solution which finds the correct answer ...
متن کاملApproximate String Matching with Variable Length Don ' t Care
Searching for DNA or amino acid sequences similar to a given pattern string is very important in molecular biology. In fact, a lot of programs and algorithms have been developed. Most of them are based on alignment of strings or approximate string matching. However, they do not seem to be adequate in some cases. For example, the DNA pattern TATA (known as TATA box) is a common promoter that oft...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Information processing letters
دوره 118 شماره
صفحات -
تاریخ انتشار 2017